Pitch-Adaptive Front-End Features for Robust Children's ASR
نویسندگان
چکیده
In the presented work, we explore some of the challenges in recognizing children’s speech on automatic speech recognition (ASR) systems developed using adults’ speech. In such mismatched ASR tasks, a severely degraded recognition performance is observed due to the gross mismatch in the acoustic attributes between those two groups of speakers. Among the various sources of mismatch, we focus on the large differences in the average pitch values across the adult and child speakers in this work. Earlier studies have shown that the Mel-filterbank employed in the feature extraction is not able to smooth out the pitch harmonics sufficiently in particularly for the high-pitched child speakers. As a result of that, the acoustic features derived for the adult and the child speakers turn out to be significantly mismatched. For addressing this problem, we propose a simple technique based on adaptive-liftering for deriving the pitch-robust features. This enables us to reduce the sensitivity of the acoustic features to the gross variations in pitch across the speakers. The proposed features are found to result in improved performance in the context of deep neural network based ASR system. Further with the use of the existing feature normalization techniques, additional gains are noted.
منابع مشابه
Uncertainty Decoding with Adaptive Sampling for Noise Robust DNN-Based Acoustic Modeling
Although deep neural network (DNN) based acoustic models have obtained remarkable results, the automatic speech recognition (ASR) performance still remains low in noise and reverberant conditions. To address this issue, a speech enhancement front-end is often used before recognition to reduce noise. However, the front-end cannot fully suppress noise and often introduces artifacts that are limit...
متن کاملPerfect Tracking of Supercavitating Non-minimum Phase Vehicles Using a New Robust and Adaptive Parameter-optimal Iterative Learning Control
In this manuscript, a new method is proposed to provide a perfect tracking of the supercavitation system based on a new two-state model. The tracking of the pitch rate and angle of attack for fin and cavitator input is of the aim. The pitch rate of the supercavitation with respect to fin angle is found as a non-minimum phase behavior. This effect reduces the speed of command pitch rate. Control...
متن کاملA Study on the Effect of Pitch on LPCC and PLPC Features for Children's ASR in Comparison to MFCC
In this work, following our previous studies, we study and quantify the effect of pitch on LPCC and PLPC features and explore their efficacy for children’s mismatched ASR in comparison to MFCC. Our analysis shows that, unlike MFCC, LPCC feature has no major influence of pitch variations. On the other hand, similar to MFCC, though PLPC is also found to be significantly effected by pitch variatio...
متن کاملMatching the Acoustic Model to Front-End Signal Processing for ASR in Noisy and Reverberant Environments
Distant-talking automatic speech recognition (ASR) represents an extremely challenging task. The major reason is that unwanted additive interference and reverberation are picked up by the microphones besides the desired signal. A hands-free human-machine interface should therefore comprise a powerful acoustic preprocessing unit in line with a robust ASR back-end. However, since perfect speech e...
متن کاملEnhancement of noisy speech for noise robust front-end and speech reconstruction at back-end of DSR system
This paper presents a speech enhancement method for noise robust front-end and speech reconstruction at the back-end of Distributed Speech Recognition (DSR). The speech noise removal algorithm is based on a two stage noise filtering LSAHT by log spectral amplitude speech estimator (LSA) and harmonic tunneling (HT) prior to feature extraction. The noise reduced features are transmitted with some...
متن کامل